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APPARATUS AND METHOD OF ADDITIVE SYNTHESIS OF DIGITAL 
AUDIO SIGNALS USING A RECURSIVE DIGITAL OSCILLATOR 



[001] BRIEF DESCRIPTION OF THE INVENTION 



[002] This invention relates generally to the processing of digital audio signals. 

IVIore particularly, this invention relates to a technique for additive synthesis of digital 
audio signals using a recursive digital oscillator. 



[003] BACKGROUND OF THE D^TVENTION 



[004] Additive synthesis is a signal synthesis technique based on the Fourier 

Theorem. This theorem states any signal can be decomposed into a set of constituent sine 
waves, and that the sum of the constituents will reconstitute the original. Additive 
synthesis is classified as a receiver-based synthesis algorithm, but differs fi:om 
receiver-based schemes, such as subtractive synthesis and sampling, in that it is represented 
in the spectral (fi-equency) domain rather than the time domain. 

[005] There are many benefits in the use of additive synthesis for sound 

production in computer music applications. These include expressive musical control 
over fine timbral distinctions, perceptually relevant parameterizations, sample rate 
independence of timber description, availability of many analysis techniques, high 
control bandwidth, and multiple dimensions for resource allocation/optimization. 

[006] The challenge of the additive synthesis technique is the computational 

intensity of the separately controllable sinusoidal partials. A single low fi*equency piano 
note can require hundreds of time-varying sinusoids for accurate reproduction. IVLusically 
effective use of additive synthesis in live performance can require the ability to control 
many hundreds or even thousands of sinusoidal partials in real-time. 
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[007] This computational challenge is addressed by resolving two issues: which 

hardware architecture to use and which sinusoid generation algorithm to use on the 
selected architecture. Digital Signal Processors or vector processors are a good selection 
for the data type and associated computational demands. Unfortunately, such 
5 architectures do not always support full-range (i.e., floating-point) arithmetic; fixed point 

may be all that is provided. There is always a large demand for low-cost 
implementations. Therefore, it is desirable to be able to exploit a relatively inexpensive, 
moderate-precision arithmetic hardware architecture, such as a 16-bit processor. 

10 [008] A number of sinusoidal partial production techniques may be used on a 

selected hardware architecture. These techniques can be placed in three classes: those 
that implement recursive filters, those using table-lookup, or those that work in the 
transform-domain using techniques, such as the inverse fast fourier transform. The 
transform-domain approach is most advantageous for apphcations requiring many 

1 5 sinusoids and for which some error in phase and amplitude and some latency is 

acceptable. The lookup technique is the most widely used for applications requiring a 
few sinusoids at a very high data rate, such as radio fi*equency communications. 
Recursive oscillators have several advantages, including the inherent fine-grain exposure 
of data parallelism, the far more hmited demand on the memory system compared to 

20 table look-ups, the lower induced latency than with a transform-domain approach, the 

latency flexibility, and/or the attainable phase accuracy. 

[009] The primary problem with digital recursive oscillators is managing 

long-term stability as rounding and truncation errors accumulate. Another problem with 
25 recursive oscillators is providing sufficient fi-equency coefficient resolution. 

[0010] In view of the foregoing, it would be highly desirable to provide an 
improved technique for processing real-time partials on a general purpose hardware 
architecture. Ideally, the technique could be readily implemented on a 
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moderate-precision arithmetic hardware architecture, such as a 16-bit processor. The 
technique should address the problem of error acciunulation inherent in recursive 
methods. In addition, the technique should provide sufficient frequency coefficient 
resolution. 

5 

[0011] SUMMARY OF THE INVENTION 

[0012] The method of the invention is directed toward performing additive 
synthesis of digital audio signals with a recursive digital oscillator. The method includes 

10 the step of receiving digital audio signal frames wherein each digital audio signal frame 
includes a set of frequency, amplitude, and phase components represented as coefficients 
of variables in a mathematical expression. Each digital audio signal frame thereby 
includes a frequency coefficient representation. Converted frequency coefficients are 
formed by linearly re-mapping bits of the frequency coefficient representation to bias 

15 audio reproduction accuracy toward low frequency signals. Additive synthesis is then 

performed with the converted frequency coefficients. 

[0013] The method of the invention also includes receiving digital audio signal 
frames wherein each digital audio signal frame includes a set of frequency, amplitude, 

20 and phase components, represented as coefficients in the standard mathematical 

expression of the Fourier theorem; the step of converting frequency components of each 
digital audio signal frame to bias reproduction accuracy toward lower frequencies in the 
audio spectrum through the use of a re-mapping of the bits of the component and through 
the addition of a range-extending shift amount; and the step of performing additive 

25 synthesis via the use of an efficient recursive digital oscillator structure that uses the 

converted frequency coefficients internally. 

[0014] The apparatus of the invention includes a computer readable memory to 
direct a processor to fimction in a specified manner. The computer readable memory 
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includes a first set of executable instructions to receive digital audio signal frames 
wherein each digital audio signal frame has a set of specified frequency values expressed 
as a bit sequence. A second set of executable instructions transforms the bit sequence to 
represent lower frequencies with more significant bits and higher frequencies with less 
5 significant bits. A third set of executable instructions facilitates additive synthesis of the 

digital audio signal frames in a reduced-precision recursive digital oscillator. Sound is 
produced as multiple recursive oscillators operate in parallel. 

[0015] The invention provides an improved technique for real-time production of 
10 summed variable-frequency sinusoids on a general purpose hardware architecture. The 
technique is readily implemented on a moderate-precision arithmetic hardware 
architecture, such as a 16-bit processor, but is also successfiiUy implemented on a 
variety of hardware architectures. The technique of the invention addresses the problem 
of error accumulation inherent in recursive oscillation, the problem of providing adequate 
1 5 frequency coefficient resolution inside individual oscillators, and the problem of 

providing computationally efficient additive synthesis on a variety of hardware platforms. 

[0016] BRIEF DESCRIPTION OF THE DRAWINGS 

20 [0017] For a better understanding of the invention, reference should be made to 

the following detailed description taken in conjimction with the accompanying drawings, 
in which: 

[0018] FIGURE 1 illustrates an apparatus for implementing an embodiment of the 
25 invention. 

[0019] FIGURE 2 illustrates an embodiment of the invention in the context of an 
analysis/re-synthesis framework, and identifies the processes in the framework that 
require real-time performance. 
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[0020] FIGURE 3 illustrates overlapping audio frames processed in accordance 
with an embodiment of the invention. 

[0021] FIGURE 4 illustrates the partitioning of a theta term into alpha and beta 
5 components in accordance with an embodiment of the invention. 

[0022] FIGURE 5 illustrates a comparison of original absolute error due to 
coefficient quantization error, and the modified error achieved in accordance with an 
embodiment of the invention. 

10 

[0023] FIGURE 6 is a detailed illustration of the modified absolute error due to 
coefficient quantization error achieved in accordance with an embodiment of the 
invention. 

15 [0024] Like reference numerals refer to corresponding parts throughout the 

drawings. 

[0025] DETAILED DESCRIPTION OF THE INVENTION 

20 [0026] Figure 1 illustrates an apparatus 20 that may be used to implement an 

embodiment of the invention. The apparatus 20 includes the components associated with 
a general purpose computer. In particular, the apparatus 20 includes a processor 22, 
many variations of which are discussed below. The processor 22 is connected to a set of 
input/output devices 24 via a bus 26. The input/output devices 24 may include such 

25 components as a keyboard, mouse, speakers, video monitor, and the like. 

[0027] A memory (primary and/or secondary) 28 is connected to the bus 26. The 
memory 28 stores a set of executable instructions used to implement the processing of the 
invention. In particular, the memory 28 stores a non-real-time processing module to 
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perform prior art processing of the type described below. In accordance with the 
invention, the memory 28 also stores a frequency coefficient conversion module 32. As 
discussed below, the frequency coefficient conversion module 32 re-maps bits of a 
frequency coefficient representation to bias audio reproduction accuracy at the 
5 input/output devices 24 toward low frequency signals. An additive synthesizer 34 built 

using a new formulation of a prior art recursive oscillation technique is then used to 
process the linearly re-mapped bits of the frequency coefficient representation. For the 
purpose of convenience, the invention is frequently described in the context of a single 
recursive oscillator. This reference to a single recursive oscillator contemplates the use 
10 of multiple recursive oscillators operating in parallel to produce sound, as understood 
from the following discussion. 

[0028] The invention is directed toward the frequency coefficient conversion 
module 32 and the additive synthesizer 34, which efficiently creates sound based on the 

1 5 output of the conversion module 32. The context in which this module operates and the 

operations that it performs are more fully appreciated with reference to Figure 2. 
Figure 2 illustrates an example of a complete additive analysis/synthesis system 
framework. The steps to the right of the thick dashed line 50 are computed in real-time 
by the frequency coefficient conversion module 32 and the additive synthesizer 34. The 

20 steps to the left of the line 50 are performed by the non-real-time processing module 30. 

[0029] The concept behind this particular separation is that a set of sound 
primitives, expressed as sets of overlap-add frames, called timbral prototypes, can be 
generated off-line via the non-real-time steps as part of the compositional process. Then 
25 at performance time, sets of timbral prototypes are loaded into and out of memory 28 

according to a score, where they can be manipulated and combined in response to 
controller input from a performer operating the input/output devices 24. The modified 



-6- 



Atty, Doc. UCB-ll/APP(B96-033-2) 
Appl. No. 09/521,641 



***SUBSTITUTE SPECIFICIFICATION*** 
Reply to Office Action of Oct. 29, 2003 



frames are then synthesized in real-time for subsequent audition. The use of such a 
paradigm enables additional degrees of freedom in performance than available through, 
for example, conventional sample-playback-based synthesis. 

5 [0030] The processing associated with the present invention is directed toward the 

final step, that of taking a set of dynamically changing frames and synthesizing them into 
audio samples. The challenge of using a vector instruction set architecture is explicitly 
managing parallelism due to the independence of sinusoid computations. The technique of 
the invention exploits the natural coarse-grained parallelism by choosing to stripe state 

10 variables of sinusoids across the length of the vectors. The technique of the invention 
allows for implementation on a moderate-precision arithmetic unit (e.g., a 16-bit 
processor) using moderate-precision numeric representations. In particular, the invention 
provides sufficient frequency coefficient resolution by modifying a standard recursive 
form. The technique also reduces quantization-induced noise effects by keeping 

15 oscillators short-lived in order to exploit short-term fidelity. 

[0031] The input to the frequency coefficient conversion module 32 is a series of 
variable-length overlap-add frames. A succession of such frames constitute a timbral 
prototype, which is either synthetically designed or derived through a separate analysis 

20 phase, as depicted in Figure 2. The analysis phase may include the generation of a sound 
(block 60) from which spectral estimation is used to produce a set of fast Fourier 
transforms (block 62). Pitch is then detected to produce a set of pitch estimates 
(block 64). The pitch estimates are then used to identify new window lengths associated 
with the spectral estimation. This results in a new set of fast Fourier transforms 

25 (block 66). Peak detection is performed for the new fast Fourier transforms to produce 

new peak estimates (block 68). Smoothing is then performed on the peaks and an 
overlap-add frame operation (block 70) is then initiated. The frequency coefficient and 
conversion module 32 then performs coefficient re-mapping and frame stitching 
(overlap-add frames) (block 72), as discussed below. Sound 80 corresponding to the 
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initial sound (block 60) may then be produced with an additive synthesizer 34 based on 
the recursive oscillator structure described below. 

[0032] Each frame consists of a frame header and frame data. The frame header 
5 is a double-precision floating point time stamp denoting the start time of the frame and an 

integer denoting the number of partials in it. The frame data is a list containing the fixed 
frequency, peak ampUtude, and initial phase for each sinusoid in the frame, all in 
single-precision floating point. 

10 [0033] At any instant of time, a timbral prototype is being synthesized as a 

weighted sum of two constituent frames. Each of the two sets of frame data are 
synthesized at a constant frequency and phase. Irrespective of their timestamps, 
successive frames are 50% overlapped with individual amplitude envelopes linearly 
increasing from zero to the specified peak amplitude value for the first overlapped 

1 5 portion of the frame, and linearly decreasing from this peak back to zero during the second 

portion of the frame. This is illustrated in Figure 3. The two sets of scaled, overlapped 
frame partials are summed to constitute an output channel 

[0034] An important feature of this approach is that for individual generating 
20 oscillators, the frequency, phase, and amplitude remain constant. By overlapping and 

adding successive oscillators with the triangular amphtude envelope, two 
fixed-frequency, fixed-amplitude sinusoids closely approximate a single 
varying-frequency, varying-amplitude partial. 

25 [0035] As previously indicated, there are many ways to generate sinusoids. The 

most common methods include various recursive techniques, table look-up, and 
transform domain methods, such as those using the inverse fast fourier transform. The 
present invention relies upon recursive techniques due to their heavy reliance on 
explicitly parallel arithmetic with fewer time-consuming memory accesses. In 



-8- 



Atty. Doc. UCB-1 l/i^B96-033-2) ***SUBSTITUTE i^CIFICIFICATION*** 
Appl. No. 09/521,641 Reply to Office Action of Oct. 29, 2003 



accordance with an embodiment of the invention, the following digital resonator, with no 
damping or initialization impulse function, is used: 



x„ =2 cos 



\ fs J 



«— 1 «— z 



with/ as the sampling frequency, and/e (0J,/2) as the desired (constant) frequency 
of oscillation. 



[0036] To implement this equation using only sixteen-bit fixed-point multiphes, it 
is necessary to (1) manage the fixed-point units with enough precision to maintain 
accuracy across the entire audible firequency range, while (2) taking special care to 

10 provide sufficient frequency coefficient resolution to account for human ability to 

distinguish subtle differences in low frequencies. Accuracy must be maintained across a 
broader range and with more precision for low- frequency partials than a simple 
sixteen-bit fixed-point representation supplies. Additionally, because the frequency 
coefficient multiphcation is in the critical path, it is desirable to minimize the 

1 5 computational overhead of the changes. 

[0037] To quantify the issue, the minimum perceptible musical interval is 
specified. Afterwards, the resolution necessary to maintain relative frequency accuracy is 
calculated. Doing so indicates that the low-frequency components require more precision 

20 than higher ones - which is intuitive, since relative accuracy is being calculated. Thus, to 

minimize perceived error, the frequency coefficient representations are re-mapped in two 
ways: by employing an exponent intemally to emulate floating-point range extension, and 
by inverting the bit representation to bias accuracy toward low frequencies. These 
changes require two new operations per filter per sample: an add with constant shift and a 

25 variable shift. 



-9- 



Atty. Doc. UCB-1 l/APP(B96-033-2) ***SUBSTITUTE S^CIFICIFICATION*** 
Appl. No. 09/521,641 Reply to Office Action of Oct. 29, 2003 



[0038] To understand the modifications to the filter, recall the original recurrence 
relation for the sine wave generator (with co = — — ). At low frequency, the co-efficient 

fs 

2 cos(co) is very close to two, and so in a floating-point format, lower frequencies 
synthesized using the formula will have less accuracy than higher-frequencies due to the 
need to explicitly represent the leading ones in the mantissa. Numbers closer to zero 
benefit from the implicit encoding of leading zeros via a smaller exponent. In other 
words, larger values require bits with larger "significance" (absolute value) forcing the 
least significant bits in the same word to also have higher significance, thus forcing higher 
worst-case quantization error. One can more effectively use the bits of the mantissa by 
reversing this relationship, recasting the equation as: 

Xn = 2cOS(a))Xn-l - Xn-2 

Xn 2(l-s/2)Xn-l-Xn-2 
Xn 2Xn-l-eXn-l-Xn-2 

i.e., where cos(a)) = (l-s/2). 

[0039] To represent s , an unsigned sixteen-bit mantissa m is combined with an 
unsigned exponent e, biased so that the actual represented value is 8 - 2 ^ m . Thus, the 
exponent is also the right shift amount necessary to correct a 16b x 16b 32b multiply 
with 8 as an operand. The two in the exponent allows eto range from 0 to 4 when m is 
interpreted as a fractional amount and / ranges between zero and the Nyquist frequency. 

[0040] What is achieved with this re-mapping of number representation (denoted 
as Re-]V[apping for economy of language) is the ability to represent lower frequencies 
with more significant bits and mapping higher frequencies with less significant bits. In 
particular, as 2 cos(27rf/fs)varies from -2 to 2, s is defined to vary from 4 to 0. Smaller 
frequency values produce smaller values of e, helping to satisfy asymmetric accuracy 
requirements of the human auditory system. 
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[0041] Initialization can be quickly accomplished in accordance with the 
invention. In particular, the resonator can be initiaUzed to a desired frequency and phase 
at sample Xq by properly choosing the two state variables x_2 and x.i using function 
evaluations in place of an initialization forcing function. The lookup values for a 
sinusoid with phase p and frequency / are: 



x_, =sm 



P- 



2nd 



f 



; x_2 = sm 



5 J 



P- 



V 



These initiaUzations must be accurate down to the low-order bits in a 32-bit fixed point 
representation, with the binary point set between the third and fourth bit positions in 
order to support a phase in the range [0, In], In addition, it is necessary to compute the 
frequency coefficient 2-2 cos(co) to 32-bit accuracy. 



[0042] These initial evaluations can be computed more quickly by rewriting the 
equations for X-i, and X-2 in a form that requires only the computation of sin (p), cos(p), 
sin(co), and cos(o)): 
X-i = sin(p-o)) 

= sin(/?) cos(co) - cos(p) sin(co) 



X-2 = sin(p-2co) 

= sin(/?) cos(2co) - cos(p) sin(2co) 
= 2 cos(co)sin(p - o) - sin (p) 
2 cos(co)x-i- sin(p) 



[0043] It may seem that this has actually increased the amount of work to be 
performed because there are now four trigonometric evaluations rather than three (two 
initialization sines plus the cosine in the recursive form). However, this approach tums 
out to be more efficient by allowing for the judicious sharing of intermediate values in a 
tandem sine and cosine generation procedure. The tandem subroutine returns both sin(0) 
and cos(0) for 0 e [0,271] to full 32-bit fixed-point precision using a hybrid technique 
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combining table-lookup and Taylor expansion. This keeps both the table size 
manageable (2048 entries of 32 bits) and the number of terms in the Taylor expansions 
small (two). It is implemented by separating 0 into a and B as shown in Figure 4; a is the 
high-order 1 1 bits of 0, and B the remaining low-order bits, a is used in an exact (to one 
LSB) 11 -bit 32-bit table-lookups, while (guaranteed small) B is used in Taylor 
expansions. 



cos(B)^l-^and sin(B)^y^ 



6 



The accuracy of expanding each to only two terms is guaranteed by limiting the size of B 
to only the low-order 21 bits of 0. the sum of the remaining terms in each expansion 
10 sequence, for all B, is less than the LSB. Finally, a and B are combined using the 

relationships: 

sin(a+B) = sin(a) cos(B) + cos(a) sin(B) cos(a+B) = cos(a) cos(B) + cos(a) sin(B) 

15 [0044] Attention now tums to an error analysis performed in accordance with an 

embodiment of the invention. Relative frequency discrimination is based on the ratio of 
adjacent frequencies. To determine worst-case relative error, it is desirable to determine 
the maximum ratio between two adjacent £ values. Call these frequency coefficients z\ and 

82, and their corresponding frequencies f i and f 2. From the definition of q and the 
20 equation cos(o)) = (1 - s /2), 

f, cos-^ (1-^2/2) 

[0045] Taking any two adjacent numbers in the range, one can compute f\ and /2 
with the foregoing equation. Evaluating this ratio for all possible adjacent pairs of 
25 epsilon values allows one to determine that it is maximized for si = 4-2"*^ and 82 = 4-2'^^, 

where yj/^ =1 .0010337. This ratio is lower than the minimxun frequency ratio humans 
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are able to differentiate, a pitch difference of approximately four to five cents (about 
1 /25-1 /20 of a semitone). The maximiun error of the algorithm is actually less than two 

cents: ^''-S^* 1.001 156. 

I 

[0046] This calculation illustrates that by designing higher s values to coincide 
with higher represented frequencies, a good match of the numerical representation to the 
asymmetric accuracy requirements of human logarithmic pitch perception is achieved. 

[0047] Two tones that are meant to have an exact ratio in their frequencies may 
instead generate beat frequencies due to frequency quantization. This effect, caused by 
absolute error, should be minimized. 

[0048] Worst-case absolute error due to epsilon quantization is shown in Figure 5, 
which contains a side-by-side comparison below 2000 Hz for an original signal 100 and a 
modified signal 102. Figure 6 is a more detailed representation of the modified 
signal 102. As expected, the recast filter maintains more precise absolute frequency than 
the original form. 

[0049] Fundamentally, more than 1 6 bits of fractional co-efficient are necessary to 
obtain 1 Hz absolute precision across the audible spectrum. The method of the invention 
maintains reasonable error boimds in sixteen bits of mantissa by scaling these bits with 
the exponent. 

[0050 ] At each iteration of the recursive form, a small error is introduced due to 
rounding of the multiply result. Due to the recursion, this error isn't corrected until 
reinitialization of the state variables. Possible effects of this include degradation of the 
signal-to-noise ratio, degradation of the long-term phase accuracy, and a lack of 
amplitude stability-all of which can cause audible artifacts. 

[0051] These problems can be corrected via additional computations, but to avoid 
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additional computations, the invention exploits the ability to reinitialize the computation 
at overlap-add frame boundaries, thereby allowing use of a non-self-correcting (but 
higher-performance) digital oscillator form described below. 

[0052] In one embodiment, the invention was implemented on a neural network 
and signal processing accelerator board. This embodiment included a TO chip, a 16-bit 
fixed point vector arithmetic core developed by the University of California at Berkeley 
and the Intemational Computer Science Institute. The TO chip tightly couples a 
general-purpose scalar MIPS core to a high-performance vector coprocessor. TO is 
representative of digital signal processing architectures in its use of fixed-point 
arithmetic. In order to compute summations of oscillators for additive synthesis with an 
overlap-add approach for a pseudo-floating-point format, four multiplies, two variable 
shifts, two fiised (constant) shifts and adds, and two regular adds are required: 

Xn = 2Xn-l - £Xn-l+Xn-2 

An=An.l+AA 

OUti =OUti+AnX Xn 

A coded module implementing the foregoing expressions constitutes an additive 
synthesizer 34 in accordance with the invention. Observe that this additive synthesizer 34 
incorporates prior art components of additive synthesis (the idea of using an analysis step 
followed by a re-synthesis step), and the general approach of using recursive oscillators. 
However, the additive synthesizer 34 represents a new formulation of prior art recursive 
oscillation techniques in its use of a modified filter equation, its use of modified 
coefficient representations, and the explicit consideration of the human auditory system to 
provide additional computational efficiency. 

[0053] On TO, the implementation of the additive synthesizer 34 requires a total of 
9 + V« Vector arithmetic operations per sinusoid when unrolled n times. Unrolling four 
times due to trade-offs in register file pressure on TO, one achieves best-case performance 
of about 1.15 cycles/partial: two fixed- frequency sinusoids are required per 
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variable-frequency partial because of overlap-add, 9 operations are required per sine, two 
cycles are required per vector operation on TO, and the vector length is 32 elements. 
Thus, in this embodiment, performing 8 operations per cycle (peak) with a 40 MHz clock 
rate and at a 44.1 kHz sampling rate, a theoretical maximum of 768 partials can be 
5 achieved in real time excluding all overhead. The current implementation supports up to 

608 simultaneous real-time partials with frame lengths of 5.8 ms or greater, or about 1.5 
cycles per partial per sample. 

[0054] The invention may also be implemented on a Digital Signal Processor. 
10 Although Digital Signal Processors typically do not have flexibly configured vector 

pipelines, these processors support several different vector operand sizes, including single 
precision floating point, 16-bit and 32-bit fixed point. Operand size and coefficient 
alignment can be exploited according to desired frequency and amplitude of each 
sinusoidal signal sequence. 

15 

[0055] The invention may also be implemented in Field Programmable Gate 
Arrays (FPGAs). Such processors allow for the creation of new, speciaUzed arithmetic 
operations on a per instruction and per sinusoidal sequence basis. This allows use of 
lattice filter structures and sinusoidal synthesis algorithms, such as quantizers, error 
20 feedback, and non-linear operations, which are presently limited to custom hardware 

processors. 

[0056] Very Long Instruction Word (VLIW) processors may also be used to 
implement the invention. These processors have multiple concurrent arithmetic units, but 
25 use long instructions to control them rather than the vector processor's limited, but 
compact vector instructions. Like vector processors, VLIW processors benefit from 
algorithms exhibiting good locality of reference. The simplicity and regularity of the 
second order recursive kernels used in this invention allows the VLIW compiler to 
efficiently map the algorithm to a particular VLIW processor and more importantly 
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allows for effective code generation in applications where other algorithms are performed 
concurrently with the sinusoidal models, such as the high level parametric control 
structures for models. 

5 [0057] The invention may also be implemented in RISC processors. Performance 

of these processors depends on instruction order and cache utilization, both of which can 
be optimized on the basis of desired frequency and phase to most accurately and 
efficiently compute the approximating sinusoidal sequences. 

10 [0058] Since transform domain methods also work well for these superscaler 

RISC processors, it is necessary to consider further advantages of the present invention 
over transform domain methods. The first advantage is computational: since in this 
invention the sinusoids are computed directly and individually, no cost for a final 
transform is incurred. This cost is especially significant when multiple independent 

15 channels of summed sinusoids are required since a transform is required for each channel. 

Transform domain methods cannot output elements of the output sequence until the entire 
transform is performed. The resulting latency is avoided in this invention because each 
element of the sinusoidal sequence may be stitched to its predecessor sequence and be 
driven as output as soon as it is computed. 

20 

[0059] Another advantage of the invention is in connection with cache memory 
utilization. This invention does not require a tabulated frequency domain window 
fixnction at all and the triangular window fimction it does require for the stitching need 
not be tabulated as it may be computed with sufficient accuracy by accumulation. This 
25 invention therefore affords a straight- forward implementation of the window stitching 

operations for any sequence length. Transform domain methods favor window sizes 
which are powers of 2 or 3 and require considerable complexity to dynamically change 
window sizes. 

[0060] The invention may also be implemented on processors using a Residue 
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Number System. These processors are not widely deployed because of the high cost of 
conversion of numbers from traditional 2's complement representation. This problem is 
largely avoided with this invention since only the coefficients need to be converted for 
each sinusoidal sequence. The sequences themselves can be efficiently computed using 
Residue Number System arithmetic. 

[0061] The invention may also be implemented on processors with a complex 
arithmetic kernel. Such processors efficiently implement a vector rotation as a single 
complex multiply. If the norm of a constant multipUcand is set to unity, a first-order, 
complex vector rotation is mathematically equivalent to a second order real coefficient 
system. In practice, the complex arithmetic kernel may be superior because it exhibits 
smaller quantization errors. 

[0062] The foregoing description, for purposes of explanation, used specific 
nomenclature to provide a thorough understanding of the invention. However, it will be 
apparent to one skilled in the art that the specific details are not required in order to 
practice the invention. In other instances, well known circuits and devices are shovra in 
block diagram form in order to avoid unnecessary distraction from the underlying 
invention. Thus, the foregoing descriptions of specific embodiments of the present 
invention are presented for purposes of illustration and description. They are not 
intended to be exhaustive or to limit the invention to the precise forms disclosed, 
obviously many modifications and variations are possible in view of the above teachings. 
The embodiments were chosen and described in order to best explain the principles of the 
invention and its practical applications, to thereby enable others skilled in the art to best 
utilize the invention and various embodiments with various modifications as are suited to 
the particular use contemplated. It is intended that the scope of the invention be defined 
by the following claims and their equivalents. 



-17- 



